IDa-Det: An Information Discrepancy-Aware Distillation for 1-bit Detectors

177

(a) Effect of μ.

(b) Effect of λ and γ.

FIGURE 6.17

On VOC, we (a) select μ on the raw detector and different KD methods including Hint [33],

FGFI [235], and IDa-Det; (b) select λ and γ on IDa-Det with μ set as 1e4.

by 2.5%, 2.4%, and 1.8% compared to non-distillation, Hint and FGFI, under the same

student-teacher framework. Then we evaluate the proposed entropy distillation loss against

the conventional2 loss, the loss of the inner product and the loss of cosine similarity. As

depicted in Table 6.5, our entropy distillation loss improves the distillation performance by

0.4%, 0.3%, and 0.4% with the Hint, FGFI, and IDa method compared with2 loss. Com-

pared to the loss of the inner product and cosine similarity, the loss of entropy outperforms

them by 2.1% and 0.5% in mAP in our framework, which further reflects the effectiveness

of our method.

TABLE 6.5

The effects of different components in IDa-Det with Faster-RCNN

model on PASCAL VOC dataset.

Model

Proposal selection

Distillation method

mAP

Res18





78.6

BiRes18





74.0

Res101-BiRes18

Hint

2

74.1

Res101-BiRes18

Hint

Entropy loss

74.5

Res101-BiRes18

FGFI

2

74.7

Res101-BiRes18

FGFI

Entropy loss

75.0

Res101-BiRes18

IDa

Inner-product

74.8

Res101-BiRes18

IDa

Cosine similarity

76.4

Res101-BiRes18

IDa

2

76.5

Res101-BiRes18

IDa

Entropy loss

76.9

Note: Hint [33] and FGFI[235] are used to compare with our information discrepancy-aware

proposal selection (IDa). IDa and Entropy loss denote main components of the proposed

IDa-Det.